perm filename META.RMS[E81,JMC]1 blob sn#602248 filedate 1981-07-21 generic text, type T, neo UTF8
Stallman draft FTPed from mit-ai rms;meta 6 on 1981 July 21
@c[-*-scribe-*-]
Outline:
Modal logic
Meta logic

@Chapter[Defaults]

A default is policy of believing a certain sort of belief under
particular situations in which it is definitely not known to be true.
For example, the policy of believing that a bird can fly unless this
is known to be false is an example of a default.

The way you would use that default in practise is, whenever you think
about a bird, you would feel free to believe that it could fly, unless
and until you had some reason to believe it could not fly.  In this
paper, I am not attempting to deal with questions of when to perform
inferences; ignoring such issues, it is simpler to say: whenever you
think about a bird, deduce that it can fly, unless you believe that it
cannot fly.

A default is usually formalized as something which allows @B[p] to be
deduced as a theorem only if @B[¬p] is not a theorem.

@Section[Non-monotonicity]

Defaults make the reasoning system @i[non-monotonic].  This is to say,
that adding one new belief can cause other beliefs to be retracted.
Ordinary logic is monotonic, which means that adding a new axiom can
add new theorems, but cannot remove any old ones.  (It can add
negations of old theorems as new theorems, making the theory
contradictory, but that does not constitute removing old theorem.)
When there is a default, this is not so.  Adding the axiom @B[B
cannot fly] causes the statement @b[B can fly] to cease to be provable,
because its proof uses the default, and depends on the fact that @B[B
cannot fly] is not a theorem.

The classical theory of logic depends heavily on monotonicity.  It is
possible to enumerate all theorems by enumerating all possible proofs.
This makes it a recursively enumerable set.  In non-monotonic logic, a
proof is not a finite object; it can depend on the absense of any
proof of the refutation of some default.  So it is hard to say much of
a mathematical nature about the theory.  This makes monotonic logic
more interesting from the mathematician's point of view.  From ours,
however, we cannot afford to deny non-monotonic logic a place in our
logical world.  Once we know that people can engage in non-monotonic
reasoning, to ban it from our model of thought is automatically to
make our model inadequate.

@Section[Defaults in Modal Logic]

Doyle and McDermott expressed defaults in modal logic as statements of
the form

@example[@B[-L-p => p]]

In other words, if you have not deduced @B[-p], deduce @B[p] from
that.

However, this is not enough by itself.  In ordinary modal theories, no
statement of the form @B[-Lq] is ever a theorem unless it can be
deduced from @B[-q], which must already be a
theorem.  So the default would never allow its conclusion to be
deduced unless it were already a theorem.  Actually, it is clear that
the statement above cannot be enough, because it does not introduce
non-monotonicity.  We can't possibly have defaults that do their job
without putting in non-monotonicity somehow.

What is needed is a way to get theorems of the form @B[-L-p] whenever
@b[-p] is @i[not] a theorem.  This is done by what Doyle and McDermott
call @i[possibilization].  Possibilization is a rule of inference
which does essentially that.  It is the point at which
non-monotonicity enters the system.

@SubSection[Individual Theorems No Longer Have Proofs]

Possibilization cannot be formulated as a simple inference rule,
because in non-monotonic logic a theorem can no longer be justified by
exhibiting an object called a "proof".  It is not possible any longer
discuss the theoremhood of one statement in isolation.  Instead it is
necessary to discuss sets of statements, and give a criterion for
whether a set of statements can validly be taken as the set of
theorems of the theory.  Essentially the set of theorems has to
contain the axioms, be closed under the ordinary inference rules, and
also be consistent with possibilization.  Being consistent with
possibilization means that it must include @b[-l-p] whenever it fails
to include @b[-p].

Doyle and McDermott give a complete formalization of this process.  It
is not necessary for the purposes of this paper, I hope.

This phenomenon is general; it results from any form of
non-monotonicity, not just the Doyle-McDermott formulation.

@SubSection[Dependencies]

Since computer programs cannot deal with the infinite and undecidable
set of all theorems of the theory, the best they can do is try to
approximate this set by guessing.  Typically, the program will guess
that anything not yet thought to be a theorem is really not a theorem.
For example, given the default @b[-L-p => p], if @b[-p] is not known
to be a theorem, the program can go ahead and provisionally deduce
@b[p].  But the guess can be wrong; a proof of @b[-p] might be found
at any time.  Then the inference of @b[p] must be @i[retracted], and
so must everything deduced from @b[p].

In order to retract previous inferences when necessary, non-monotonic
reasoning programs employ @i[dependencies].  Dependencies record, for
each inference, what the premises and conclusion were; for each
theorem, what inferences produced it as a conclusion; for each
theorem, what inferences used it as a premise.

@Section[Advantages and Disadvantages of Doyle-McDermott Defaults]

One useful property of this way of representing defaults is that they
can be the result of reasoning.  A default can be deduced through a
process of reasoning, perhaps including other defaults, and then it
can be used to deduce conclusions.  So it is not necessary to go
outside the logical system to represent the mechanisms which produce
new defaults.

However, this representation of defaults has a paradoxical property
when it is used in the modal system S5.  It was shown by McDermott
that possibilization has no effect when added to S5: the theorems of
S5, plus any extra axioms such as defaults, with
possibilization are the same as those without possibilization.  In
particular, the default

@Example[@B[Bird B & -L-(B can fly) => (B can fly)]]

implies the opposite default,

@Example[@B[Bird B & -L-(B cannot fly) => (B cannot fly)]]

from the previous one.  Thus, the belief that birds by default can fly
implies the belief that birds by default cannot fly.  There is no way
to express a preference for either belief about the bird's flying
ability over the other belief.  This paradoxical fact comes because it
is possible to contrapose the first default and then back-drive it.

(proof of this?)

Remember that S5 is a strong modal system which contains the axiom
that if something fails to be a theorem, you can prove that it fails
to be a theorem.  There are many contexts in which this is too strong
to be true; for example, anything which includes arithmetic or set
theory, and therefore contains undecideable statements.  However, in
many limited contexts, it seems plausible to use S5.  So the inability
to represent defaults usefully this way in S5 is a problem.  It also
suggested (to me, at least) that the mechanism is wrong.

So let's consider another way that has been suggested for representing
defaults.


@Section[Reiter's Defaults]

Reiter has a different way of representing defaults.  He makes each
default a separate non-monotonic inference rule.  For example, an
inference rule of the form

@example[If @B[B cannot fly] is not a theorem,
 then deduce @b[B can fly].]

shows the form of a default in his system.

This form of default is not susceptible to being back-driven.  It can
be used with no fear of paradox in S5.  However, it does not allow a
default to be the product of reasoning, because reasoning cannot
produce inference rules, only theorems.


@Section[A New Way of Representing Defaults]

To define a system in which defaults can be represented effectively by
theorems even in S5, I define a new modal operator @b[S], which stands for
"should be a theorem".

Whereas @b[Lp] says that @b[p] @i[is] a theorem, @b[Sp] says that
@b[p] @i[ought to be] a theorem.  It doesn't talk about the situation;
it controls the situation.  @b[Lp], if true, asserts that there is
some proof of @b[p], and @b[Lp] is normally proved by exhibiting a
proof of @b[p].  The intent of @b[Sp] is that proving it @i[makes]
@b[p] a theorem.  There isn't any other proof; @b[p] is deduced from
@b[Sp].  "La preuve, c'est moi."

@b[S] does its job by means of an inference rule:

@example[If @b[Sp] is a theorem,
 deduce @b[p]]

which we assume for all @b[p].  This makes a statement of
the form

@example[@b[-L-p => Sp]]

serve as a default.  If @b[-p] is not a theorem, then via
possibilization we have @b[-L-p], and then we get @b[Sp] and then
@b[p].

The difference between @b[L] and @b[S] is that we do @i[not] have an
axiom of the form @B[Sp => p], whereas we do have @B[Lp =>
p].  From @b[-p], we can deduce @b[-Lp].  We cannot deduce @b[-LSp]
from @b[-Lp], either.  As a result, the default cannot be back-driven.
We can still necessitate and contrapose it to get @B[-LSp =>
-L-L-p], but we cannot get @b[-LSp] to feed into it.

@section[Inter-Theory Inference Rules]

Metalogic is obtained by generalizing modal logic by adding a theory
parameter to each modality.  The modality @b[S] can be generalized in
this way, to produce @b[SbTh(T,p)].  The inference rule which gives
@b[S] its meaning becomes

@example[If @B[SbTh(T,p)] is a theorem (in any theory),
 deduce @b[p] in theory @b[T]]

Once we have given @b[SbTh] its intended meaning, we can represent a
default for @b[p], in theory @b[T], as

@example[@b[-Th(T,-p) => SbTh(T,p)]]

which can be believed in any theory and still do its job on @b[T].

Notice that the conclusion of this inference is not in the same theory
as the premise.  This is an @i[inter-theory inference rule] which
takes assumption(s) in one or more theories, and makes its conclusion
in a possibly different theory.  The inter-theory inference rule is
my first suggestion for a new primitive logical concept.  It has other
uses as well.

@Chapter[Non-Monotonic Inferability]

Suppose we want to have a default which is undenyable outside of a
limited area of reasoning.  That is, within a particular stage of
reasoning, the default can be overruled by contrary evidence; but
after all inputs of a certain nature are considered, if the default
still stands, it stands forever.  If the conclusion later proves to be
wrong, the relationship between the inputs and the conclusion cannot
be questioned; instead, the inputs themselves become suspect.

This sort of restricted-retractability default can be useful in
connection with the frame problem for motion.  Given a game, and, for
each kind of move in the game, descriptions of what pieces @i[do]
move, how can we conclude that the other pieces do @i[not] move?

The natural method is to use a default: if a piece is not known to
have moved, it has stayed still.  However, if all our reasoning goes
on in one large theory, this can lead to undesirable consequences.
If, for any reason, we decide that a certain piece is not where it
used to be, the default is simply overridden.

To make this concrete, let us suppose that the game is chess, and the
move I am thinking about was to advance a pawn.  I deduce by default
that the other pieces, including my king, are still in their old
positions.  However, if anything should imply that my king is now in
another square, I would notice no contradiction.  I would simply say,
"All right, the king is no longer in the old position; the default is
overridden."  My rational choices would be to consider the line of
reasoning contradictory, or else, if I am only guessing what the move
was, to decide that this guess was wrong.  That is to say, I ought to
conclude that a move of the pawn was @i[not] what took place, if the
result was to move the king.  I should not blithely accept that this
particular pawn move, counter to the usual rule, happened to move the
king.

What I need to conclude is the hypothetical statement that, if the
move was the pawn move that I am considering, then the king is in its
old position.  The reasoning used to reach this conclusion is
non-monotonic, and uses the default rule that any piece which cannot
be shown to be moved by the move under consideration must have stayed
still.  But the hypothetical statement which is the conclusion is
certain, not just a guess.

This suggests the concept of non-monotonic inferability:
given a theory T containing general information, if we add @b[q] as an
axiom, does @b[p] follow as a theorem?

In monotonic logic, the implication @b[q => p] serves as a
statement of the idea that one statement @b[p] could be deduced from
another statement @b[q].  In non-monotonic logic, it does not do the
job.  It is possible for adding @b[q] as an axiom to a theory to cause
@b[p] to be a theorem, without @b[q => p] being a theorem of the
original theory.  In general, whether @b[p] will be obtained as a
conclusion depends on precisely which axioms we were to add; adding
@b[q] may do the trick only if we do not add @b[r] as well.  In this
case, @b[q => p] could not be a theorem of the original theory, for if
it were, adding @b[q] would allow us to deduce @b[p] regardless of
@b[r].

So if we want to be able to talk about whether adding @b[q] to a
theory @b[T] would allow us to deduce @b[p], we need a new logical
construct to express the idea.  One construct we can use is to define
@B[T<q>] to be the theory which is @b[T] with @b[q] added.  Then we
can simply say that @b[p] is a theorem in that theory:

@example[@B[Th(T<q>,p)]]

@B[Th(T<q>,p)] might be true even though @B[Th(T<q,r>,p)] is false.

@Chapter[Open-ended Theories and Languages]

One of the purposes of meta-logic is to be able to reason about the
intentions of other beings.  Since in a meta-logical system theories
serve both as the locus for reasoning and as the objects reasoned
about, we ought to use a definition of "theory" which is good for both
purposes.  It must be suitable for the formulation of logical
reasoning.  At the same time, it must serve as a data base for
representing our knowledge and speculations about the beliefs of some
other entity (let's call him Fred).  The standard definition of
"theory" used by logicians is not optimal for this application.

A @b[theory], as considered by a logician, is based on a @i[language]
fixed in advance.  The language is a particular set of constant
symbols, function symbols, relation symbols and variables.  These
determine a set of well formed formulas.  Once the language has been
specified, the logician can then consider different theories, which
are subsets of the set of well formed formulas.  When formalizing a
particular mathematical structure, such as a group, it is traditional
to design a language with a minimally adequate set of symbols, each
adopted for a specific reason, and build the theory on it.

Logicians have defined "theory" in this fashion because it is
mathematically useful.  The Skolem-Lowenheim theorem involves
consideration of many theories with the same language, and it is
essential that the original language be fixed.  Nonstandard analysis
makes use of the distinction between two models of a theory that
cannot be distinguished by any statement that can be formulated in
that theory's language -- because the language has been chosen in
advance and does not include any way to ask certain questions.

When we try to @i[use] a theory as a representation for Fred's
knowledge, these mathematical considerations do not come up.  Instead,
the need to specify the set of symbols in advance becomes only a
hinderance.  Even if we are allowed to change our minds later, and
admit some new symbols, the whole idea of separating the the language
from the theory is still excess mental baggage.  It should not be
necessary to consider the question of whether Fred has a predicate
symbol for apples in order to represent the statement that Fred
believes lemons are yellow.

This is not much of a problem as long as we stick to the pure
philosophical idea of representing Fred's beliefs with a logical theory.
For this purpose, we can
in principle consider @i[all] of Fred's beliefs,@footnote[This assumes
we are considering Fred at one moment only.  When we attempt to think
about how Fred's beliefs develop with time, working with fixed
languages may again become inconvenient.] choose a language with
constants and functions that are sufficient for expressing
them, and formulate the theory using this language.  But actually
using a logical theory @i[as a data base] to represent our current
knowledge of and speculations about Fred's beliefs is another matter.
Our knowledge and speculations can change very rapidly, even in the
absense of any change in Fred himself.  It can easily and
unpredictably find a use for any constant or predicate that we
ourselves can imagine; for Fred @i[might] have imagined it too.
Then, if the line of reasoning does not pan out, the symbol
might cease to appear in our thoughts about Fred, never to reappear.

For this application, it is best to formulate the concept of "theory"
so that the set of constants and the set of predicates are open-ended,
just like the set of theorems.  We can dispense entirely with a list
of all possible symbols, and say that the symbols of the theory are
just those that appear in any of the theorems.  The language is no
longer explicitly identifiable, just a byproduct of the theory.
Looked at another way, we choose a standard language with
infinitely many symbols of each kind, most of them with no intended
meaning; then we use this language for all of the theories in our
metalogical system, assigning meanings to the symbols as they are
needed.

@Section[Theories vs Approximations to Theories]

To a logician, a theory is a set of all theorems that are to be
believed.  In interesting cases, this set is infinite.  The set may be
described extensionally, with an infinite list of members, or it can
be described intensionally, by specifying axioms and inference rules.
Either way, it is the same set.

An actual reasoning being cannot enumerate the infinite set of all
theorems.  He may find it useful to make statements @i[about] that
set, at times.  But he cannot actually use the infinite set to @i[do]
the reasoning in the theory, because he cannot construct it.

Reasoning can only be done using partial enumerations of those
theorems deduced so far.  We can call these approximations to the
whole, infinite, theory.  Each inference results in a new
approximation, and the hope is that this tends toward the theory
itself in some fashion appropriate for practical purposes.  In
monotonic reasoning, the new approximation contains the old one, and
the union of all possible approximations is the entire theory.  This
is a suitable sense in which the approximations tend toward the
theory; any particular theorem must eventually be included.  For
non-monotonic logic, it is not necessarily so that the generation of
new approximations tends toward any one fixed point.  It may even
loop.  Nevertheless, we are stuck with
approximations anyway because they are the most a finite being can
contain.

Logicians formalizing reasoning do not normally consider the concept
of an approximation to the theory.  They can succeed in formalizing
logic this way because they are formalizing it after the fact.  Their
purposes are served by statements @i[about] the results of reasoning;
they do not need to @i[implement] reasoning, because they use @i[their
own] reasoning ability to do that!  For them, statements about the
entire infinite theory are sufficient.

In a meta-logical system, theories are intended to support reasoning
as well as serve to be reasoned about.  Since the reasoning process
actually involves approximations, we must make the approximations
explicit when we talk about the reasoning.  However, we must not
confuse the approximations with the infinite theory which is their
goal.  Previous work on meta-logical systems has usually confused the
two.  The researcher usually believes he is working with one, or the
other; but he tries to make it do both jobs, and pardoxes result.

Ordinary reasoning within a theory @b[T] can go on without explicit
recognition of the fact that an approximation is being used to
implement it.  The point where such recognition is necessary is in
the evaluation of @B[Th(p,T)].  In order to give this expression a
truth value, we need an approximation for @b[T].  Inferring a
theorem in @b[T], whether by ordinary reasoning going on within @b[T]
itself or by means of an inter-theory inference rule, creates a new
approximation for @b[T].  From then on, references to @b[Th(p,T)]
ought to use the new approximation, and so should further reasoning
within @b[T].

[Hmm.  It looks like this forces us to believe that there are
approximations inside the system, but not to have any way in
the system to talk about them.  Which seems wrong.  I feel that there
must be some reason why it is important to formalize the approximation
mechanism as well, but I can't get my hands on it]

@Chapter["Concepts" and Computed Statements]

@Section[Conscious Access to Dependencies]

*******
jmc comments on Stallman

practice

"Doyle and McDermott give a complete formalization of this process".
See Martin Davis's grumbles in the AI Journal issue.  Apart from that
"completeness" is a technical term in logic, and you surely don't mean
to use the word in its technical meaning.

"Remember that S5 is a strong modal system which contains the axiom
that if something fails to be a theorem, you can prove that it fails
to be a theorem".  This isn't correct.  If the S5 axioms are added
to a system which admits the interpretation of necessity as provability,
then the system becomes inconsistent.  This is one of the ways of
looking at Goedel's theorem.  I'll explain further if you like.